Conversation
| @@ -91,9 +91,6 @@ impl<T: Transaction> BlockGasLimitProcessor<T> { | |||
| } else { | |||
There was a problem hiding this comment.
Medium: this still enables hot-state accumulation under the existing add_block_limit_outcome_onchain gate even though this PR intentionally changes how to_make_hot is derived. Once hotness_in_epilogue is turned on, mixed-version validators will compute different promotion sets for the same block and serialize different BlockEpiloguePayload::V2 contents, so this rollout is not safe without a separate gate for the new derivation.
There was a problem hiding this comment.
Stale comment
Aptos Security Bugbot has reviewed your changes, there are still 2 issues that need to be addressed from previous scan.
Open findings:
- Rollout gating for the new hot-state derivation remains unresolved
- Sequential BCS-fallback discards can still affect hot-state promotion
Sent by Cursor Automation: Security Review Bot
There was a problem hiding this comment.
Stale comment
Aptos Security Bugbot has reviewed your changes, there are still 2 issues that need to be addressed from previous scan.
Open findings:
- Mixed-version rollout divergence in hot-state derivation
- Discarded transactions still influence hot-state promotions
Sent by Cursor Automation: Security Review Bot
There was a problem hiding this comment.
Stale comment
Aptos Security Bugbot has reviewed your changes, there are still 2 issues that need to be addressed from previous scan.
Open findings:
- Mixed-version rollout divergence in hot-state derivation
- Discarded transactions still influence hot-state promotions
Sent by Cursor Automation: Security Review Bot
There was a problem hiding this comment.
Stale comment
Aptos Security Bugbot has reviewed your changes, there are still 4 issues that need to be addressed from previous scan.
Open findings:
- Mixed-version rollout divergence in hot-state derivation
- Sequential BCS-fallback discards can still affect hot-state promotion
- HashSet iteration can make hot-state promotions nondeterministic
- Discarded transactions still influence hot-state promotions
Sent by Cursor Automation: Security Review Bot
There was a problem hiding this comment.
Stale comment
Aptos Security Bugbot has reviewed your changes, there are still 4 issues that need to be addressed from previous scan.
Open findings:
- Rollout gating for the new hot-state derivation remains unresolved
- HashSet iteration can make hot-state promotions nondeterministic
- Discarded transactions still influence hot-state promotions
- Nested respawned sessions still do not contribute data reads to the final hot-state read set
Sent by Cursor Automation: Security Review Bot
There was a problem hiding this comment.
Stale comment
Aptos Security Bugbot has reviewed your changes, there are still 5 issues that need to be addressed from previous scan.
Open findings:
- Mixed-version rollout divergence in hot-state derivation
- Sequential BCS-fallback discards can still affect hot-state promotion
- HashSet iteration can make hot-state promotions nondeterministic
- Discarded transactions still influence hot-state promotions
- Respawned-session reads are still missing from the recorded hot-state read set
Sent by Cursor Automation: Security Review Bot
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Stale comment
Aptos Security Bugbot has reviewed your changes, there are still 4 issues that need to be addressed from previous scan.
Open findings:
- Mixed-version rollout divergence in hot-state derivation
- Sequential BCS-fallback discards can still affect hot-state promotion
- Discarded transactions still influence hot-state promotions
- Respawned-session reads are still missing from the recorded hot-state read set
Sent by Cursor Automation: Security Review Bot
There was a problem hiding this comment.
Stale comment
Aptos Security Bugbot has reviewed your changes, there are still 4 issues that need to be addressed from previous scan.
Open findings:
- Rollout gating for the new hot-state derivation remains unresolved
- HashSet iteration can make hot-state promotions nondeterministic
- Discarded transactions still influence hot-state promotions
- Nested respawned sessions still do not contribute data reads to the final hot-state read set
Sent by Cursor Automation: Security Review Bot
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Profiling shows contention on the sharded `RwLock`s in the state key registry. - Bump shard counts from 8 to 32 (the ceiling for which arrays still derive `Default`) to spread distinct keys across more locks. Position shards stay at 8 since that key space is bounded by a single exchange's universe; raw shards are test-only. - Wrap each shard in `CachePadded`, matching mvhashmap and the VM code caches. A shard is ~80 bytes, so adjacent shards shared a cache line and even read-lock traffic bounced neighboring lock words. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
There is going to be a bit of perf cost for the recording. I've done a few small optimizations, but they won't eliminate all regressions. I'm going to land this now and we can follow up more if needed. |
| let to_make_hot = self.get_keys_to_make_hot(); | ||
| if self.hot_state_op_accumulator.is_some() { | ||
| counters::HOT_STATE_PROMOTIONS_PER_BLOCK.observe(to_make_hot.len() as f64); | ||
| } | ||
| TBlockEndInfoExt::new(inner, to_make_hot) |
There was a problem hiding this comment.
Medium: to_make_hot is now serialized into BlockEndInfoExt, but the sharded execution path still never materializes those promotions back into the block-epilogue output. by_transaction_execution_unsharded() patches each BlockEpilogue with output.add_hotness(...) before parsing, while by_transaction_execution_sharded() goes straight from the sharded VM outputs to Parser::parse(). Any node re-executing the same block through the sharded path therefore drops the epilogue hot-state writes that the unsharded path applies, so HOTNESS_IN_EPILOGUE can produce different hot-state updates and checkpoint hashes depending on which execution path is used.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
The hot state accumulator kept `to_make_hot` in a `BTreeSet`, so on the sequential commit path every recorded read paid O(log n) `StateKey` comparisons, each dereferencing the registry `Arc` and comparing key contents. Hashed inserts are much cheaper (`Hash` reuses the precomputed key hash and `Eq` is `Arc` pointer equality), and deterministic order only matters once per block, when the promotion cap picks the surviving subset — so keep the set unordered and have `get_keys_to_make_hot` sort the eligible keys before applying the cap, selecting the same subset regardless of read/iteration order. Insertion also clones a key only on first sighting; repeated keys (those touched by every transaction in the block) are the common case. On the single-node benchmark this is worth a few percent mean TPS on cheap workloads.
The mixed-version suites — compat, framework upgrade, and `k8s_test_suite` (which runs the same `FrameworkUpgrade` test) — run genesis with default features, which enable `HOTNESS_IN_EPILOGUE` while mainnet and testnet still have it off. An upcoming change to how the promoted set is derived makes old and new binaries disagree on the hotness portion of transaction outputs when the flag is on, so pin it off to keep these suites aligned with the networks they are meant to mirror. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Hot state promotions (the `to_make_hot` set in the block epilogue) are part of consensus, so their inputs must be deterministic. This reworks how that set is derived. ## Why the old derivation was problematic Promotions were fed from BlockSTM's read/write summary, which exists for conflict detection, not for this: - **Path-dependent.** The summary differs between parallel and sequential execution (e.g. aggregator v1 reads served by delta resolution are dropped in parallel but kept in sequential), so the same block could promote different keys depending on how it happened to execute — a divergence in a consensus-agreed artifact. - **Coupled to gas config.** It was only populated inside the `conflict_penalty_window` branch, so promotions silently depended on an unrelated block-gas knob. - **Incomplete write exclusion.** Its write side misses in-place delayed-field rewrites, aggregator v1 writes/deltas and module writes, so a key the block writes could still be promoted by the epilogue even though the write already makes it hot. ## Approach Record the read set at the VM boundary, where it is a pure function of the transaction and the pre-state — identical across parallel and sequential execution and independent of gas config: - `StorageAdapter` records resource, resource-group, table-item, aggregator v1 and config reads. Respawned sessions share one recorder so all of a transaction's sessions accumulate into a single set. - `ReadRecordingCodeStorage` wraps the code storage and records module fetches. It sits above the global module cache, so a module is recorded whether served from that cache, the per-block cache or storage. A script's declared module dependencies are recorded from the loaded script, so recording does not depend on the verified-script cache, whose warmth is execution-schedule dependent. - A transaction's output carries the data and module keys as two collections; they are disjoint by construction, so consumers iterate a chained view and nothing pays to merge them. - Written keys are enumerated directly from the change set, covering every write kind the conflict summary missed. ## Worth calling out The new set intentionally differs from the old one — e.g. `exists<T>` now loads the resource and counts as a read — so it ships behind the existing hotness feature flag, which the mixed-version forge suites keep off (see the preceding forge commit) to avoid old/new nodes disagreeing on transaction output. Adds a per-block promotions histogram and tests covering sequential/parallel parity, each read and write kind, script dependencies, and discard handling. Follow-ups: read-kind/hotness tagging and an on-chain, byte-based promotion cap. ## Performance The recording runs unconditionally, so its per-transaction cost has to stay small. Even a trivial transaction fetches 10+ framework modules and re-reads the same resources across its sessions, so the hot paths are engineered around allocation and clone traffic: - The interpreter fetches the same module many times in a row. A last-recorded fast path (a reused string buffer, so module switches allocate nothing) skips the bookkeeping for such runs, and the recording wrapper's delegators are `#[inline]` so wrapping does not deoptimize the module fetch path. - All recording sets are keyed by `StateKey` — which carries a precomputed 32-byte hash — under FxHash (via the maintained `rustc-hash`), pre-sized past the typical framework-module count, and keys are cloned only on first sighting; repeats, e.g. framework modules touched by every transaction, are the common case. Measured on the single-node benchmark with interleaved paired runs, together with the preceding `to_make_hot` ordering commit: ~2-2.5% net mean TPS cost across representative workloads, and ~0.3-0.4us per transaction on the sequential move-e2e micros. An ablation build with recording no-oped attributes the entire cost to the recording machinery itself (diffuse per-transaction set and clone work rather than any single hot symbol) and confirms the accumulator commit offsets part of it. Candidate follow-ups if more needs to come back: a per-thread module-key memo to keep the registry lock off the extraction path, pointer-hashed recorder sets (`StateKey` equality is already pointer identity), pooling the per-transaction sets, and an indexed key registry to cut refcount traffic on hot shared keys. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
| let to_make_hot = self.get_keys_to_make_hot(); | ||
| if self.hot_state_op_accumulator.is_some() { | ||
| counters::HOT_STATE_PROMOTIONS_PER_BLOCK.observe(to_make_hot.len() as f64); | ||
| } | ||
| TBlockEndInfoExt::new(inner, to_make_hot) |
There was a problem hiding this comment.
Medium: output-based replay still drops block-epilogue hotness.
by_transaction_output() passes stored TransactionOutputs straight into Parser::parse(), but WriteSetV0 still skips hotness in serde and this replay path never rehydrates it from the BlockEpilogue payload. Live unsharded execution does that patch-up in by_transaction_execution_unsharded() before parsing. Chunks that include block epilogues can therefore recompute a different hot-state update set than live execution, and once hot_state_root_in_txn_info is enabled the replayed hot-state checkpoint hash will diverge from the committed TransactionInfo.
There was a problem hiding this comment.
Aptos Security Bugbot has reviewed your changes, there are still 10 issues that need to be addressed from previous scan.
Open findings:
- Rollout gating mismatch for new hotness derivation
- Retry outputs still feed hot-state promotion
- BCS-fallback discards still count toward block-limit accounting
- TRANSACTION_INFO_V1 can be enabled before hotness is serialized
- Pre-rollout read-set retention still creates a CPU and memory amplification path
- Internal config lookups contaminate the transaction read set
- Full retained hot-state union can still exceed the final promotion cap in memory
- Transitive eager-loading dependency module reads are still missed
- Staged init_module execution can still bypass module-read recording
- Sharded re-execution still does not rematerialize epilogue hotness
Sent by Cursor Automation: Security Review Bot
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
✅ Forge suite
|
✅ Forge suite
|
✅ Forge suite
|


Hot state promotions (the
to_make_hotset in the block epilogue) arepart of consensus, so their inputs must be deterministic. This reworks how
that set is derived.
Why the old derivation was problematic
Promotions were fed from BlockSTM's read/write summary, which exists for
conflict detection, not for this:
execution (e.g. aggregator v1 reads served by delta resolution are
dropped in parallel but kept in sequential), so the same block could
promote different keys depending on how it happened to execute — a
divergence in a consensus-agreed artifact.
conflict_penalty_windowbranch, so promotions silently depended on anunrelated block-gas knob.
delayed-field rewrites, aggregator v1 writes/deltas and module writes, so
a key the block writes could still be promoted by the epilogue even
though the write already makes it hot.
Approach
Record the read set at the VM boundary, where it is a pure function of the
transaction and the pre-state — identical across parallel and sequential
execution and independent of gas config:
StorageAdapterrecords resource, resource-group, table-item, aggregatorv1 and config reads. Respawned sessions share one recorder so all of a
transaction's sessions accumulate into a single set.
ReadRecordingCodeStoragewraps the code storage and records modulefetches. It sits above the global module cache, so a module is recorded
whether served from that cache, the per-block cache or storage. A
script's declared module dependencies are recorded from the loaded
script, so recording does not depend on the verified-script cache, whose
warmth is execution-schedule dependent.
collections; they are disjoint by construction, so consumers iterate a
chained view and nothing pays to merge them.
write kind the conflict summary missed.
Worth calling out
The new set intentionally differs from the old one — e.g.
exists<T>nowloads the resource and counts as a read — so it ships behind the existing
hotness feature flag, which the mixed-version forge suites keep off (see
the preceding forge commit) to avoid old/new nodes disagreeing on
transaction output. Adds a per-block promotions histogram and tests
covering sequential/parallel parity, each read and write kind, script
dependencies, and discard handling. Follow-ups: read-kind/hotness tagging
and an on-chain, byte-based promotion cap.
Performance
The recording runs unconditionally, so its per-transaction cost has to
stay small. Even a trivial transaction fetches 10+ framework modules and
re-reads the same resources across its sessions, so the hot paths are
engineered around allocation and clone traffic:
last-recorded fast path (a reused string buffer, so module switches
allocate nothing) skips the bookkeeping for such runs, and the
recording wrapper's delegators are
#[inline]so wrapping does notdeoptimize the module fetch path.
StateKey— which carries aprecomputed 32-byte hash — under FxHash (via the maintained
rustc-hash), pre-sized past the typical framework-module count, andkeys are cloned only on first sighting; repeats, e.g. framework modules
touched by every transaction, are the common case.
Measured on the single-node benchmark with interleaved paired runs,
together with the preceding
to_make_hotordering commit: ~2-2.5% netmean TPS cost across representative workloads, and ~0.3-0.4us per
transaction on the sequential move-e2e micros. An ablation build with
recording no-oped attributes the entire cost to the recording machinery
itself (diffuse per-transaction set and clone work rather than any single
hot symbol) and confirms the accumulator commit offsets part of it.
Candidate follow-ups if more needs to come back: a per-thread module-key
memo to keep the registry lock off the extraction path, pointer-hashed
recorder sets (
StateKeyequality is already pointer identity), poolingthe per-transaction sets, and an indexed key registry to cut refcount
traffic on hot shared keys.
Co-Authored-By: Claude Fable 5 noreply@anthropic.com
Note
High Risk
Changes consensus-visible hot-state promotion inputs and adds always-on per-transaction read recording on the execution path; mitigated by the existing hotness feature flag and forge disabling it for mixed-version tests.
Overview
Hot state promotion (
to_make_hotin the block epilogue) no longer derives from BlockSTM’s conflict read/write summary. The executor now builds the promotion set from VM-recorded reads and change-set writes, so the result does not depend on parallel vs sequential scheduling, conflict-penalty gas config, or incomplete write coverage in the old summary.Recording at the VM boundary:
StorageAdapteraccumulates data reads (resources, resource groups, tables, aggregator v1, configs) in a sharedReadRecorderacross respawned sessions. Block execution wraps code storage inReadRecordingCodeStorage, which records module fetches on everyModuleStoragepath. Script transactions additionally record immediate module dependencies from loaded bytecode viarecord_module_read, avoiding schedule-dependent verified-script cache warmth.Outputs and block aggregation: Committed transactions attach an
UnorderedReadSet(separate data and moduleStateKeysets) onAptosTransactionOutput; discards drop reads.BeforeMaterializationOutputexposesstorage_keys_readandstorage_keys_written(the latter from resource/module/aggregator write sets, including kinds the conflict summary missed).BlockHotStateOpAccumulatoris fed in commit order viaaccumulate_hot_state_rw, not fromReadWriteSummaryduring fee accumulation.Supporting changes:
BlockHotStateOpAccumulatoruseshashbrown::HashSetfor eligible reads and sorts when applying the per-block cap; a promotions-per-block histogram is added.StateKeyregistry shards increase and useCachePaddedfor less lock contention. Move VMLayoutCache/ kill-switch traits are delegatable so the recording wrapper can forward. Forge mixed-version suites disableHOTNESS_IN_EPILOGUEto avoid cross-binary output disagreements.New e2e tests assert sequential/parallel parity and coverage for read kinds, write exclusion, scripts, discards, and republished modules.
Reviewed by Cursor Bugbot for commit 1fb64fe. Bugbot is set up for automated code reviews on this repo. Configure here.